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^ , Abstract 

A fundamental problem in dynamic frequency reuse is that the cognitive radio is ignorant of the 
amount of interference it inflicts on the primary license holder. A model for such a situation is proposed 
and analyzed. The primary sends packets across an erasure channel and employs simple ACK/NAK 
feedback (ARQs) to retransmit erased packets. Furthermore, its erasure probabilities are influenced by 
the cognitive radio's activity. While the cognitive radio does not know these interference characteristics, 
it can eavesdrop on the primary's ARQs. The model leads to strategies in which the cognitive radio 

ON ' 

adaptively adjusts its input based on the primary's ARQs thereby guaranteeing the primary exceeds a 

in ■ 

target packet rate. A relatively simple strategy whereby the cognitive radio transmits only when the 
primary's empirical packet rate exceeds a threshold is shown to have interesting universal properties in 

o 

the sense that for unknown time-varying interference characteristics, the primary is guaranteed to meet 



o 



X 



its target rate. Furthermore, a more intricate version of this strategy is shown to be capacity-achieving 
for the cognitive radio when the interference characteristics are time-invariant. 

I. Introduction 

Systems often need to be designed so that they do not disrupt pre-existing systems with which they 
interact. This backwards compatibility problem is a central issue in the study of cognitive radio systems. 
A cognitive radio is a device that can sense and adjust its power, frequency band, etc. to peacefully 
coexist with other radios with which it shares spectrum [1]. The FCC and international regulatory bodies 
are considering modifying their rules to allow for such systems to occupy unlicensed bands or to share 
bands with licensed, predesigned communication systems. These licensed users are often called primaries, 
legacy systems, or incumbents. 
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The aim of this paper is to study sharing spectrum with legacy systems, in which the backwards 
compatibility problem arises. One potential solution is to transmit on a band that is currently unoccupied 
and to leave that band once a primary is detected. For these "detect-and-avoid" systems, one research aim 
is to understand the feasibility of detecting the presence of a primary system subject to noise uncertainty 
and quantization effects [2], [3], [4]. 

A different approach is for the cognitive radio to occupy bands on which the primary is already active 
but in such a way as to mitigate the interference generated on the primary system. Two such information- 
theoretic models have been introduced to study cognitive radio and spectrum sharing systems. The first is 
sometimes called the cognitive radio channel [5], [6], [7], [8], [9], [10], [11]. This channel is a variation on 
the two-user interference channel [12], [13], [14], [15] with the modification that the cognitive radio (one 
of the transmitters) knows the message that the primary (the other transmitter) will send. Among these 
papers, Devroye, Mitran, and Tarokh [5] as well as Jovicic and Viswanath [6] consider a Gaussian scenario 
in which the primary's strategy can be thought of as a fixed, predesigned legacy system. Specifically, 
they show that for their setup, there is an optimal achievable strategy that enables the primary to continue 
using a point to point Gaussian codebook. The result highlights the fact that in cognitive radio problems, 
one may not have the flexibility to modify the primary's design and must instead design the cognitive 
radio in such a way that the primary continues to meet its target performance. The second approach 
is to consider the capacity of systems with a constraint on the interference power generated at certain 
locations. The assumption is that the primary systems that occupy these locations will be able to handle 
this level of interference [16], [17]. 

We take inspiration from these two models in the following example, which forms the starting point 
of the current investigation. It is a model of the practically most interesting case, where the cognitive 
transmitter is close to the primary receiver, thus creating substantial interference. For simplicity, we assume 
the cognitive radio's receiver is shadowed from the primary transmitter, thereby avoiding interference from 
that system. An illustration of the setup is given in Figure 1. 

Example 1: Suppose the primary sends packets across an erasure channel and receives feedback from 
its receiver to retransmit the packet or send the next one. The cognitive radio, on the other hand, has a 
noiseless channel to its receiver with P + 1 channel inputs divided into two classes: a silent symbol x s 
results in a successful receipt of the primary's transmitted packet, and the remaining P transmit symbols 
cause the primary's packet to be erased. Suppose the primary wants a guaranteed rate of i; that is, one 
packet should be successfully received per two transmissions on average. By simply alternating channel 



June 9, 2008 



DRAFT 



3 



uses between the silent symbol and sending information with the P transmit symbols, the cognitive radio 
guarantees the primary rate 1/2 target and can itself achieve a rate of 



^ogP , 

where P + 1 is the number of channel input symbols available to the cognitive radio. 



(1) 



In the spirit of the previous work, Example 1 considers a primary that is unaware of the cognitive 
radio. However, Example 1 makes the more dubious assumption that the cognitive radio knows what 
the primary's erasure probabilities are for its two classes of inputs. As a result, the strategy presented 
is not robust to deviations from the erasure probabilities provided in the example. For instance, if the 
primary's erasure probability for a silent symbol x g is eo > 0, the strategy outlined will not allow the 
primary to meet its rate 1/2 target. The issue is that the cognitive radio will not be able to directly 
estimate the interference it creates for the primary. Such estimates are generally obtained by training via 
pilot symbols, but the primary receiver is unlikely to train with the cognitive radio transmitter. However, 
certain kinds of 1-bit feedback have been shown to be sufficient for beamforming [18], [19]. We adopt 
this insight as we build on Example 1 by introducing both an uncertainty and sensing component to the 
problem. 
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Fig. 1. An example of the type of channel model for our cognitive radio system in which the primary has message W p and 
the cognitive radio has message W s . The cognitive radio transmitter can listen to the ARQ feedback the primary receiver sends 
to its transmitter to adapt its transmission rate and reduce interference on the primary system. 



Example 1, continued: The cognitive radio's silent symbol now induces an erasure probability eo < 1/2, 
and its transmit symbols induce an erasure probability of ei > 1/2, both of which are unknown to the 
cognitive radio. However, the cognitive radio transmitter can sense the primary's ARQs, which we will 
denote with the indicator random variables when the primary's k-th transmission is received. Figure 
1 shows a schematic block diagram of this setup. 
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The cognitive radio's strategy is as follows. If the primary is exceeding its rate target at time k, the 
cognitive radio sends one of its transmit symbols based on its message. This happens when \ Yli=i ^k > 
\. Otherwise, the cognitive radio sends its silent symbol. Let be the indicator function that the cognitive 
radio sends a transmit symbol at time k. Thus, P(Ak = l|rfc) = 1 — e Tfc . Then the cognitive radio's rate 
at time n is 

1 n 

-Vr fc logP. (2) 
n ^— ' 

fe=i 

Note that for eo = 0, e x = 1, this strategy is as good as the one outlined in Example 1. 

What can we say about the rate for the primary and cognitive radio in Example 1? Let So = and 
Sk = Sk-i + (Ak — 1/2) represent the difference between the number of packets the primary has received 
by time k and its targeted number of packets by time k based on a target packet rate of ^. Suppose the 
Ak are independent in k. Then Sk is a positive recurrent Markov chain and is nonnegative if and only 
if Tfc = 1, which can be verified by confirming that its stationary distribution is 

% > 



n/2 = < 



(2e 1 -l)(l-2e ) / l- £l 
2ei(ei— e ) V ei 



i+l 

I '•• \ 

I 2(l-e )(ei-e ) 



(3) 



We can make the following statement. 

Fact: Suppose Sq is distributed according to ir. Then for all k > 1, 

P(r fc = l) = f;^ /2 = 1^^. (4) 
7^o 61 " e ° 

The fact allows us to get a handle on the cognitive radio's rate. Furthermore, the primary's expected 
rate is 

k 

k ~ x E E = = 3)nn = j) = 2 ■ (5) 

i=l j=0,l 

Note that this strategy does not depend on the cognitive radio knowing the values eo and e\ a priori. 
However, the cognitive radio does know the primary's rate target, which is 1/2 in this example. In the 
remainder of the paper, we assume the primary's rate target is known in advance to the cognitive radio, 
but the primary's erasure probabilities are unknown. 

In this work, we consider optimal coding strategies for the case in which the primary is a packet 
erasure system as described in Example \} For the channel of the cognitive radio, we consider a more 

'This formulation lends itself well to many spectrum sharing problems in which the primary is a separately designed system 
and whose exact implementation is partially obscured from the cognitive radio. 
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general class of (noisy) channels. As we show, the primary can meet its target rate even if the cognitive 
radio is active for a certain fraction of channel uses. This interference budget available to the cognitive 
radio, while unknown a priori, can be estimated via the primary ARQs and rate target, which are known 
at the cognitive radio encoder. One can determine the capacity of the cognitive radio in terms of this 
interference budget, which we call the rate-interference budget (RIB) tradeoff function. We show an 
achievable strategy for the general case in which the primary's packet erasure probabilities can fluctuate 
and find a matching converse for the RIB function when they do not. 

In Section II, we define the problem we are considering precisely, including the channel model for the 
cognitive radio and the allowable coding strategies that the cognitive radio can adopt. These strategies 
force the cognitive radio to provide guarantees about the primary's rate that do not depend on the time 
horizon that the cognitive radio uses to measure its own rate (horzion-independence condition) and force 
it to be robust to fluctuations in the primary's packet erasure probabilities (robustness condition). 

In Section III, we show how to refine the strategy from Example 1 to provide such guarantees that 
also allow positive rate for the cognitive radio, which leads to two new strategies: the fixed-codebook 
protocol and the codebook-adaptive protocol. In Section IV, we present a converse when the erasure 
probabilities are time-invariant, which matches the rates achievable by the codebook-adaptive protocol 
proposed in Section III. Section V revisits Example 1 in the introduction and considers new ones. Section 
VI concludes the paper with a discussion of our contributions and future work. 

II. Problem Setup and Main Result 

Capital letters X, Y, Z represent random variables and calligraphic letters X, y, Z denote finite sets. 
We will focus on discrete memoryless channels in this work, but potential extensions to Gaussian channels 
will be discussed in Section VI. For convenience, p(x) is the probability distribution of X at x. Similarly, 
p(y\x) is the conditional probability distribution of Y at y given X = x. Notation for entropy H(X), 
mutual information I(X;Y), etc. are consistent with the notation of Cover and Thomas [20]. 
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Fig. 2. Equivalent channel model from the cognitive radio's perspective. Ai is an indicator random variable: Ai — 1 means 
that the packet sent by the primary at time i was successfully received. 
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A. Equivalent Channel Model 

As a legacy ARQ system, the primary is assumed to have the following fixed strategy. At time i, it 
sends a packet to its receiver and receives feedback Ai to indicate whether the packet was erased or 
successfully received (Ai = or 1, respectively). If the packet is erased, the primary retransmits the 
same packet at time i + 1. If the packet is successfully received, the primary transmits a new packet at 
time i + Thus, we will refer to Ai as the primary's ARQ feedback. 

Since the primary's strategy is fixed, we now have to design the cognitive radio's strategy. Figure 2 
illustrates this problem; the primary merely appears as a constraint on the cognitive radio in the shape 
of Ai. That is, in addition to communicating, the cognitive radio must also control its channel inputs to 
guarantee the primary's rate, i.e. such that to the first-order 2 , 

k 

k-^Ai^R,, (6) 

i=l 

where R p is the desired and prespecified performance of the primary system. Furthermore, this control 
must be robust to fluctuations in the channel between the cognitive radio transmitter and primary receiver. 
Thus, the primary's ARQ feedback provides a means for the cognitive radio to apply this control. 

B. Channel Model and Coding 

We now consider the DMC with feedback from Figure 2 in more detail. Let X = {x s, 1, . . . , \X\ — 1} 
be the channel inputs. Then at time i, the conditional distribution of the channel output and primary's 
ARQ Ai given Xi = x can be expressed as 

p(yi,a,i\xi = x) =p{yi\xi = x) ■ e Xji ■ exp ( on ■ log- — ^i) . (7) 

V e x,i J 

We assume that the sequences {e x ,i}i^i> for x £ X are unknown at the encoder and decoder. For 
simplicity, we assume 

£x B,i = e (8) 

does not depend on i, which assumes that the primary's channel is fixed when the cognitive radio is 
silent. 

However, allowing e X) j to vary with i for x ^ x s reflects uncertainty about the amount of interference 
the cognitive radio is generating on the primary. We will assume that for all x / x ff and % = 1,2,..., 

e x ,i > eo (9) 
2 Second order issues and tight delay constraints are discussed in Section VI. 
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In the remainder of this paper, we make the technical assumption that there is a known constant v > 
such that for all i, R p < 1 — eo — v. This assumption enables the primary to tolerate some interference 
from the cognitive radio while guaranteeing the cognitive radio achieves a positive rate. 

The definition of the rate and capacity for the secondary are complicated by the fact that the number 
of channel uses depends on the realizations of e x ^. Therefore, we need to be precise on what is meant by 
messages. We define the set of possible messages to be the set of binary sequences {0, 1}™°™"*, where 
Cmax = logmin{|Af|, \y\}. Let Wk be the first k bits of the message and W = W n c max . 

An (n, f n , g) code (we call n the blocklength) consists of a set of encoding functions fa : {0, l}* -1 x 
{0, l}»Gn« _^ x for % = 1, 2, . . . , n, 

X i = f i (A i -\W) , (10) 

and decoding function g : y n — > {0, \} nC ™^ 

W = g(Y n ). (11) 

A strategy is a sequence of (n, f n , g) codes indexed by n on the positive integers n = 1, 2, . . . . 
Strategies must respect the primary's rate target, so the following definition restricts the type of 
strategies we allow. 

A strategy is valid if for all v > and for k < n in each (n, f n , g) code, 

P (k" 1 J2 A i< ^ < Ki, Rp ,v, k e- k - K2 ^ , (12) 

where the constants K\^^^ < oo, < i^2,R p ,^ < oo depend only on the fat in the system v and target 
rate R p , and the right side of (12) goes to as k — > oo. 

Note that a valid strategy imposes two restrictions. First, the convergence of the primary's rate should 
not depend on the blocklength of a strategy (horizon-independence condition). Second, the convergence of 
the primary's rate should be the same irrespective of eo, {e Xj j}^ 1 , x G X — {x G ff} (robustness condition). 
For a given valid strategy, we will use the notation W n to denote the decoded output for its code of 
blocklength n. 

A rate R is achievable if for all 5 > 0, there exists a valid strategy and no (5, eo, {e x ^}'^ l ,R v ) such 
that for the strategy's codes with blocklength n > uq, 

n^(MJ^WMj)^' ( 13 ) 
The set of achievable R is denoted as lZ(eo, {€ Xi i}j^ 1 ,R p ). 
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The rate -interference budget (RIB) function i? IB (eo, {e x ,i}uLi, Rp) is defined as 

Rm(e ,{e x , i }°Z 1 ,R p )= sup R. (14) 

For the special case in which e X) j = e x for all i, it will be convenient to use the shorthand e, where e* is 
a length \X\ vector, and we will use the shorthand Rm(e,R v ). 

C. Contributions 

We now state the main contributions of this paper. First, we find a valid strategy that achieves positive 
rates for the cognitive radio. From the definition of a valid strategy, this implies that there exists a 
sequence of codes such that the primary meets its target rate irrespective of eo, {e Xj j}^ 1 . 

Proposition 1: For all v > and corresponding eo, {t x ,i}i^Li, R P , a lower-bound to the RIB function 
is at least 

Rm (eo, K*h=i, Rp) > (1 - V (! " £ o)) ■ C* , (15) 

where C* = max p ( x ) I(X;Y). Moreover, there exists a valid strategy that achieves the above rates for 
all (ecle^j-^-Rp) satisfying v > 0. 

Proposition 1 follows immediately from Theorem 1. Furthermore, we can precisely characterize the 
capacity of the cognitive radio for the case of time-invariant interference on the primary, in which e Xj j = e x 
for all x G X. 

Proposition 2: For all v > and cases in which e x ^ = e x for all i, the RIB function is 

R m {e,R p )= max I(X; Y) . (16) 

p(x): 
£ x e x p(x)<l-iip 

Moreover, there exists a valid strategy that achieves the above rates for all (e, i? p ) satisfying ^ > 0. 

For this setting, we will refer to the constraint ^ x e x p(x) < 1 — R p as the interference budget. Note that 
the constraint is based on how much interference each of the cognitive radio's channel inputs generates 
on the primary compared to how much is tolerable for the primary's desired performance. 

Proposition 2 follows immediately from Theorem 2, which provides achievability, and Theorem 3, 
which provides the converse. These are stated in Sections III and IV, respectively. We note that Theorem 
2 relies on a more intricate valid strategy than the one in the proof of Theorem 1 . 
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III. Achievable Strategies 

In this section, we present two achievable strategies and state results on the rates the cognitive radio 
can achieve while guaranteeing rate to the primary under various interference conditions. The first of 
these- the fixed-codebook protocol- is a generalization of the approach considered in Example 1, in 
which the cognitive radio becomes active only when the primary is meeting its target rate. We show 
that this strategy is valid, i.e. the primary meets its rate target under unknown time-varying interference 
characteristics, and can give equally general rate guarantees for the cognitive radio. The second strategy- 
the codebook-adaptive protocol- builds on the first strategy to predict the amount of interference the 
cognitive radio will generate on the primary and optimize its codebook to maximize its own rate. Like 
the first strategy, this strategy is also valid, so the primary meets its rate target under unknown time- 
varying interference characteristics. We provide rate guarantees for the cognitive radio under the more 
limited set of unknown time-invariant interference characteristics, and in Section IV, we show that the 
codebook-adaptive protocol provides the optimum rate for the cognitive radio within this set. 

A. Fixed-Codebook Protocol 

Recall the approach considered in Example 1 over the noiseless channel. The silent symbol x n is used 
for each channel use when the primary is not meeting its target rate. Otherwise, one of the remaining P 
symbols is used to send information about the message. As demonstrated in that example, this leads to 
a rate proportional to log P. However, this strategy appears to be wasteful in that x s is not being used 
to send information about the message. 

One way to overcome this limitation is to group multiple channel uses into frames. Each frame is 
either silent - consisting of only silent symbols x s - or active - consisting of any combinations of all 
P + 1 symbols, including x s- Clearly, over the active frames, this increases the rate since the available 
channel input alphabet is larger. The main issues are: 

• To find a rule by which the cognitive transmitter decides before each frame whether the frame will 
be silent or active. The cognitive transmitter then also needs some way of indicating its choice to 
the cognitive receiver. 

• To appropriately select the frame length. If the frame length is too short, then no rate gain is attained. 
Conversely, if the frame length is too large, then the non-interference guarantee given in (12) can 
no longer be respected. 

We now illustrate the approach in the context of Example 1. For the sake of concreteness, consider the 
case in which the frame length K n = 3 channel uses. For this illustration, we will assume the decision 
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to become active is governed by the threshold rule 2~^j=i 
look as follows: 



3L(*-1)/3J ( A . 



|) > 0. Then a sample run may 



i 


1 


2 


3 


4 


5 6 


7 ••• 


E%\ {A* ~ I) 





1 

2 


1 


1 

2 


-1 


_1 ... 




X ff 


x & 


Xott 


^on,l 




X ff • • • 



Times i = 4, 5, 6 represent an active frame, where the channel input at time i = 4 is simply a beacon to 
indicate to the decoder that the frame is active; the message information is sent over i = 5,6. Despite the 
fact that the primary meets the rate target Rp = | over channel uses 2 and 3, the cognitive radio sends 
the silent symbol Xgs for the duration of the frame. Thus, one has to be careful to set the frame length 
K n and transmission threshold to make sure the cognitive radio can achieve a significant rate. Likewise, 
the cognitive radio sends the message information (x ff, ^on,p) over channel uses 5 and 6 even though 
the primary no longer exceeds the rate target |- Thus, one has to be careful to set the frame length K n 
and transmission threshold so that the primary's rate satisfies (12), so the strategy is valid. 

"3=1 ^3 
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X ff • • • XoB 


X fi • • • I ff 
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Fig. 3. In the fixed-codebook protocol, channel uses are grouped into units known as frames. At the start of a frame, the 
cognitive radio encoder chooses to become active if the primary's packet rate X^=i A? ' s aDOVe a threshold R p + 7 + 0(1). 
Otherwise, it stays silent for the frame, i.e. sends the symbol x ff. On an active frame, the encoder uses a length K n repetition 
code to signal to the decoder that it is active and sends a codeword over the remaining channel uses to convey additional bits 
of the message. 



We now use the intuition from the illustration above to construct the fixed-codebook protocol, which 
we will then prove is a valid strategy, as defined in Section II. Figure 3 provides an illustration of the 
fixed-codebook protocol. For convenience, we define 



Sk — — -Rp) 1 



(17) 



which is positive at time k if and only if the primary is exceeding its target rate. 
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1 ) Determining Silent Frames: As before, the cognitive radio makes a decision to be silent or active 
over frames of length K n channel uses. Specifically, the following condition specifies the frames over 
which the cognitive radio is silent: 

Xj = x off if Si 3 - ijj < K n , (18) 

where ij = [(j — 1)/K n \ ■ K n , and 7 is an additional parameter for setting the threshold along with K n 
to satisfy condition (12). 

2) Active Frames: It remains to define what the cognitive transmitter does over an active frame. As 
in the noiseless case, we want to inform the decoder that the frame is active, but in the noisy case, it 
cannot be done with a single channel use. 

a) Repetition Coding: The cognitive transmitter uses a length n n repetition code to inform the 
cognitive receiver that the frame is active. We will assume without loss of generality there exists a 
channel input x rep / x s such that F(Y = y\X = x ff) / F(Y = y\X = x rep ) for some y. (Note: we 
can assume this without loss of generality since if it were not true for any symbol, the cognitive radio's 
channel inputs Xi would be independent of the channel outputs Yj, and the channel could not be used 
for communicating in the first place.) Then the repetition code over the first K n channel uses of an active 
frame is specified by the following condition: 

Xj = x rep if S i:j - ijj > K n , ij < j < ij + K n , (19) 

where ij = [(j - l)/K n \ ■ K n . 

b) Message Information: For the remaining channel uses of an active frame, the encoder sends 
information about the message to the decoder. It does so with a blocklength K n — K n codebook Cfi xe d 
of rate C* — 5, where C* = maxp^) I(X; Y). We will denote codeword m as X Kn ~ Kn (m), where 
m G {1, . . . , exp{(if n - K n )(C* - 6)}. 

The following notation will be useful for understanding the channel inputs during the remainder of an 
active frame. Let V\ denote the channel index preceding the start of the first active frame, V2 the second, 
V3 the the third, and so on. That is, 

V\ = m.i{i > : Si — > K n , i = mK n for some m G Z} , (20) 

Vk = inf{i > Vk-i : Si — ij > K n , i = mK n for some m G Z} . (21) 

We now characterize the remaining channel inputs. For the ^-th active frame and letting mg be bits 

i(K n - K n )(C* - 6) log 2 e + 1 through (I + l)(K n - K n )(C* - 5) log 2 e of message W, 

Xj = Xj_ Ve _ Kn (m e ) if V e + K n >j>V e + K n . (22) 
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A summary of the fixed-codebook protocol is given in Table I. 

TABLE I 

Summary of the fixed-codebook protocol. 





Conditions 


Description 


Xj-v e - Kn {mi) 


Vt + K n >j>V e + K n , i€Z+ 


Active frame: X Kn ~ Kn {mi) £ Cflxed to send fragment mi. 




Ve+K n >j> V e , £eZ+ 


Repetition code: notify decoder of active frame 


X S 


all other j 


Silent frame 



3) Performance of the Fixed-Codebook Protocol: For this strategy, we have the following result. 

Theorem 1: For all R p , v > 0, there exist choices of n n , K n , 7, 6 such that the fixed-codebook protocol 
is a valid strategy, i.e. primary's packet rate satisfies the condition in (12) for all {e^j}^, x € X — {x Q g}. 
Furthermore, for these parameter choices, the rate 

1 _ -^-) . C * (23) 



1 - e 

is achievable for the cognitive radio, where C* = maXpW I{X; Y). 

Proof: While other choices will work, for the purposes of the proof, we will let 

K n = Ln 1 / 8 ] ,n n = Ln 1 / 16 ] , (24) 

and any 7 satisfying < 7 < min{<5/2, u/2}. We will assume that 5 > 0, but a detailed prescription is 
given in Lemma 6 below to allow the rate loss to become arbitrarily small. 
The proof of the theorem is divided into three parts. 

1) The primary's rate satisfies condition (12), so the strategy is valid. (Lemma 3) 

2) There exists a codebook such that cognitive radio decoder error probability is small, thus satisfying 
(13) for some R. (Lemma 4) 

3) By appropriately choosing 8, the R in (13) can be made arbitrarily close to ^1 — yr^) " C*. 
(Lemma 6) 

These results are proved in the Appendix I. ■ 

B. Codebook- Adaptive Protocol 

Let us return to Example 1 . Theorem 1 implies that when the fixed-codebook protocol is applied to 
the noiseless channel, the cognitive radio is guaranteed to achieve rates 

R> (l-i? P /(l-e ))-log(l + P) . (25) 
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Hence, the fixed-codebook protocol only adapts the duty-cycle to the actual degree of interference. In 
equation (25), when P is large, this is not a good strategy. A better strategy would be to also adapt 
the codebook to the actual degree of interference, thereby guaranteeing a higher duty-cycle. Clearly, 
to still meet the interference guarantee the rate of the adapted codebook will typically be smaller, but 
with respect to equation (25), this penalty will appear in the logarithm. In this section, we propose the 
codebook-adaptive protocol, which first obtains a coarse measurement of the actual interference (Phase 
I), uses this to select a codebook appropriately and communicates its choice to the decoder (Phase II), 
and then runs the standard fixed-codebook protocol described in Section III-A (Phase III). 




Phase I 



Phase II 



Fig. 4. The codebook-adaptive protocol is like the fixed-codebook protocol except the first two active frames are used to select 
a codebook to use and inform the decoder about it. In Phase I, the cognitive radio sends pilots of each of its channel inputs 
and uses the ARQs to create estimates of the interference it generates on the primary. In Phase II, the cognitive radio notifies 
the decoder which among a polynomial sized set of codebooks it has selected based on its estimates from Phase I. Phase III, 
which immediately follows the end of Phase II above, is almost identical to the fixed-codebook protocol, except the codewords 
are now from the codebook selected during Phase I and Phase II. 



The codebook-adaptive protocol is summarized in Figure 4. The strategy is quite similar to the fixed- 
codebook protocol. In fact, it uses the same threshold rule and the same repetition code to signify an 
active frame. That is, the codebook-adaptive protocol follows the rules: 

Xj = x oS if -ijj < K n , (26) 
Xj = x rep if - ij7 > K n , ij < j < ij + K n , (27) 

where ij = [(j — 1)/K n \ ■ K n , are identical to conditions (18) and (19) in the fixed-rate protocol. 

The difference between the two strategies is thus in what follows the repetition code in an active frame. 
In particular, the encoder uses the first active frame to estimate the channel, the second to inform the 
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decoder which codebook it will use based on those rates, and the third and greater active frames to send 
message information using the selected codebook. 

As before, let V\ denote the channel index preceding the start of the first active frame, V*j the second, 
V3 the the third, and so on. That is, V\ = inf {i > : S{ — 27 > K n , i = mK n for some m £ Z} and 
Vk = inf{i > Vk-i : Si — ij > K n , i = mK n for some m G Z}. 

1) Phase I: During the first active frame, the cognitive radio estimates the interference produced by 
each channel input. Let fi = [ K Jx\" 1 J ■ Then for x £ {0,...,\X\ — 1}, the channel inputs for the first 
frame can be described as 

Xj = X , if Vl + K n + (X + V)H > j > Vl + K n + Xfi . (28) 

Using these channel inputs, the encoder can use the ARQs to estimate the primary's erasure probabilities. 

Vi+K n + (x+l)fl 

i=Vi+K n +xu+l 

With these estimates, the end of this first active frame marks the end of Phase I. 

2) Phase II : Based on the estimates e x , the encoder chooses a codebook among a set of codebooks; 
it informs the decoder of this choice in Phase II. 

Each codebook in the set has a different input distribution corresponding uniquely to each length-C„ 
type p x c n of X, i.e. p x c n is a probability distribution with the property that for all x G X, p x c n (x) = 
n x /C n such that n x is a nonnegative integer and ^2 xeX n x = C n . Thus, there are at most (C n + 1)1*1 
codebooks in the set. The codebook C x c n of type p x c n is a random codebook with codewords generated 
i.i.d. according to n£i~ K " Vx c ^ (%k) and has 

M x c n = exp{(if n - K n ){R x c n - 5)+} , (30) 

where R x c n is the mutual information I(X;Y) with X having input probability distribution p x c n {x). 
One then selects the codebook according to the following rule: 

X = argmax M x c„ . (31) 

Ex ? *P*c n (x)<l-iip-27-5 

The encoder uses the codebook from the fixed-codebook protocol in the second active frame to inform 
the decoder of its codebook selected codebook. (Note: Based on the parameter choices considered in this 
work, (C„ + 1)'*' is small enough to only require a messages from fixed-codebook protocol's codebook, 
so the encoder simply uses the messages that result in the lowest probability of error.) Suppose the 
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selected codebook corresponds to message m x . Then for the second active frame, 

Xj = I 3 _y 2 _ K „K) if V 2 + K n > j > V 2 + K n . (32) 

With the decoder informed of which codebook has been selected, the end of this active frame marks the 
end of Phase II. 

3) Phase III: In Phase III, the active frames are now used to send message information. Thus, they 
resemble the active frames in the fixed-codebook protocol, with the main difference that the codebook 
X is used. 

Let me be bits £(K n — K n )(R x — 5) log 2 e + 1 through (£ + l)(K n — K n )(R x — 5) log 2 e of message 
W. For the (£ + 2)-th active frame, we can express the message information segment of the frame as 

(m e ) if V e+2 + K n >j> V e+2 + K n , (33) 

where X K ^ K "(mi) £ C x . 

A summary of the codebook-adaptive protocol is given in Table II. 

TABLE II 

Summary of the codebook-adaptive protocol. 





Conditions 


Description 


X 


Vl + Kn + (x + 1)// > j > Vl + K n + X[l 


Phase I: estimate x's interference with primary's ARQs. 


Xj-v e ~K n (m x ) 


Vl + K n > j > Vl + K n 


Phase II: X Kn ~ Kn (m x ) € Caxed for selection C x . 


Xj-v t -K n {mi) 


v i+ 2 + K n >j> Vt+2 + n n ,ee z+ 


Phase III: X Kn ~ Kn (iri{) € C x to send fragment m<. 


^rep 


V( + K n >j> V e , £eZ+ 


Repetition code: notify decoder of active frame 


X B 


all other j 


Silent frame 



We now state the result for the codebook-adaptive protocol. 

Theorem 2: For all R p , v > 0, there exists a choice of K n , K n ,C n ,^, S such that the codebook-adaptive 
protocol is a valid strategy, i.e. the primary's packet rate satisfies the condition in (12) for all {e Xj i}'?L 1 , 
x e X — {x ff}. Furthermore, when the interference on the primary is time-invariant, i.e. e x ^ = e x for 
x G X — {x ff}, the rate 

max I(X;Y) (34) 

p{x): 

is achievable for the cognitive radio under the same parameter settings. 
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Proof: The parameters K n ,n n will be set as in (24), C n = y/n^, and any 7 satisfying < 7 < 
min{(5/2, v/2}. We will assume 5 > 0, but a detailed prescription is given in Lemma 12 below to get 
arbitrarily close to the rate in the statement of the theorem. 
The proof of the theorem is divided into three parts. 

1) As in the fixed-rate protocol, we can apply Lemma 3 since (18) and (26) are identical conditions. 
Thus, the primary's rate satisfies the condition (12), so the strategy is valid. 

2) The cognitive radio decoder error probability is small, thus satisfying (13) for some R. (Lemma 7) 

3) By appropriately choosing 5, the R in (13) can be made arbitrarily close to i? IB (e*, i? p ) with 
probability going to 1 as n — > 00. (Lemma 12) 

With the exception of Lemma 3, these results are proved in the Appendix II. ■ 

IV. Converse 

To show the converse, we will relax the conditions stipulated in the problem setup, thereby allowing 
a larger class of strategies. It turns out that in some cases, this larger class does not increase the rate 
region. 

Theorem 3: For all v > and for e X) j = e x for all i, 

Rm(e,R P )< max I(X;Y) (35) 

p(x): 

Proof: From the definition of achievable rate, 

nR<H(W [n{R _ s)i )-nS + l (36) 
<I(W [n{R _ 5)i ;Y n ) + 2nS + l (37) 

n 

= Y J H{Y i \Y i ~ 1 ) - H(Y i \Y i - 1 ,W [n{R _ s)i ) + 2nS + l (38) 
i=i 

n 

<^H(Yd-H(Yl\Y i - 1 ,W,A i - 1 ,X i ) + 2n6 + l (39) 

i=l 
n 

= H(Yi) - H(Y\Xi) + 2nS + 1 (40) 

i=l 
n 

= ^I(X t ;Y) + 2n5 + l , (41) 

i=l 

where (37) follows from Fano's inequality, (38) from the chain rule, (39) since conditioning cannot 
increase entropy, (40) by the Markov chain (W, A 1 ^ 1 , Y 1 ^ 1 , X 1 ^ 1 ) <-> X{ <-> Yj, and (41) by definition. 
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We have yet to place a restriction on the strategies. Recall that valid strategies need to satisfy the 
condition in (12). If condition (12) is satisfied, then the code of blocklength n satisfies 

n 

n- 1 ]T E [At] >R p - K hR ^ n e- n - K ^ . (42) 
i=l 

We now consider only this weaker condition on the channel inputs as opposed to the stronger one given 
by (12). By the concavity of mutual information with respect to its input distribution, we can combine 
(41) and (42) to yield that for all 5 > 0, there exists large enough n such that 

R< max I(X;Y) + 35 (43) 

p(x): 

E*e*P(aO<l-.Rp+<S 

Since 5 can be made arbitrarily small, we can conclude the result. ■ 

V. Examples 

Propositions 1 and 2 provide a lower bound and an exact result for the RIB function under different 
interference conditions, respectively. In this section, we evaluate the RIB function given in Proposition 2 
for cases in which the interference characteristics on the primary are time-invariant. We then evaluate the 
RIB function lower bound given in Proposition 1 for these examples when the interference characteristics 
are time- varying. 

A. Evaluation of the RIB Function for Time-Invariant Interference Characteristics 

We first explore the setting in which the interference parameters e X) j are time-invariant, i.e. e X) j = e x 
for all i,x. In this setting, Proposition 2 gives an exact expression for the RIB function i?m(e, Rp). 
We first evaluate the RIB function for Example 1. We first rewrite the expression in Proposition 2 as 

R m (e,R p ) = max I(X; Y) (44) 

p(x): 
Ex e*p(s)<l-Hp 

max H(X) 

p(x): 

p(x^x o s)ei+p(x=Xos)e <l—R t , 
= max h(p)+plogP 

- l-Rp-ep 
€ l- e O 

log(P + 1) , b > -p^j 

p+1 , (45) 
h(b) + b log P , otherwise 

where b = - . Figure 5 shows (45) in terms of b, which we can think of as a summary of the 

interference budget. 
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1/2 

interference budget 



ei— e 



Fig. 5. A schematic plot of the RIB function for Example 1 when P = 1. 



Example 2: Consider a DMC with = 1 + P channel input symbols and \y\ = P output symbols 
with the following property: 

l, y = x, xe{l,...,P} 

y £ y, x = x off . 

If eo = 0, e x = 1 for x G {1, . . . , P}, then evaluating the RIB function from Proposition 2 yields 



F(Y = y\X = x) = < 



(46) 



Rm(e,R P ) = (l-i2p)log 2 P , 



(47) 



where the units are in bits per channel use. 

We now consider a case in which the secondary has an alternative to x s to control interference. The 
channel model resembles the one in Example 2, except there are now additional channel inputs. 

Example 3: Let P be even and consider a DMC with \X\ = 1 + 3P/2 channel input symbols and 
\y\ = P output symbols with the following property: 

1, y = x, xe{l,...,P} 

y ey, x = x oS 
y = 2(x-P)-l, x€{P + l,...,P + P/2} 
y = 2(x-P), x G {P + 1, . . . , P + P/2} 
An illustration of these transition probabilities are given in Figure 6. We now consider the case in which 
e = 0, < Pp < 1, e x = 1 for x G {1, . . . , P}, and e x = e 1/2 < 1 — Pp for x G {P + 1, . . . , P + P/2}. 



P(Y = y\X 



i_ 

P' 

l 

2' 
1 

2' 



(48) 
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Fig. 6. Illustration of transition probabilities for Example 3 when P = 4. 



Under these assumptions, evaluating the RIB function from Proposition 2 yields 

Rm(e,R P ) = max I(X; Y) (49) 

= 1 "^ P " ei/2 log 2 P + log 2 (P/2) (50) 

J- - ei/2 J- - ei/2 

= log 2 P-— ^— , (51) 
1 - ei/2 

where the units are in bits per channel use. Note that the rate loss due to the primary can be at most 1 
bit in this setting. Moreover, this can be arbitrarily better than the case in Example 2 by making P large 
and R p close to 1, for which the target rate R p induced a multiplicative penalty on the log 2 P term in 
(47). 

B. Further Considerations for Time-Varying Interference Characteristics 

The most interesting and realistic scenarios concern the case when the interference characteristics are 
time-varying. The codebook-adaptive protocol introduced in Section III can deal with this as long as 
it is well behaved. However, for some "maliciously chosen" time-varying characteristics, the proposed 
startegy can be fooled into choosing a low rate codebook in Phase II when the interference conditions 
are less severe in Phase III. The effect of such a possibility is illustrated in Figure 7. One option might 
be to consider a strategy that periodically readapts the codebook, which, while potentially beneficial, is 
outside the scope of this work. Instead, we consider the simpler strategy given by the frxed-codebook 
protocol. 
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1 — Rp — eo 
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interference budget 

Fig. 7. A schematic plot of the RIB function for Example 1 when P = 1. The dashed line suggests how maliciously chosen 
time-varying characteristics can cause the encoder to select a "low rate" codebook, which saturates well below the actual RIB 
function when the interference budget is large. 



For Example 1 , Proposition 1 implies the fixed-codebook protocol lets the cognitive radio achieve the 
rate 

Rib (eo, flp) > (1 - R P /(l - eo)) log(l + P) (52) 

for all {e x ,i}i^i, x / x r. For the restricted time-invariant interference setting of Example 1, can use 
& = €l l e " to compare its performance against the RIB function. It turns out that for b > b* , the 
codebook chosen by the codebook-adaptive protocol has the same asymptotic rate and produces the same 
interference on the primary as that in the fixed-codebook protocol. Thus, depending on one's assumptions 
about the interference environment, there are instances in which the fixed-codebook protocol may be 
preferable to the codebook-adaptive protocol. 

Despite these guarantees, there are situations in which the fixed-codebook protocol can be arbitrarily 
worse. Recall Examples 2 and 3. It turns out that in both cases when eo = 0, Proposition 1 implies the 
fixed-codebook protocol guarantees rates given by 

R m (0, Ki}^, Rp) > (1 - i? p ) log 2 P , (53) 

which matches the RIB function in (47) for Example 2. However, as already illustrated, by making P 
large and R p close to 1, the RIB function in Example 3, given in (51), can be made arbitrarily larger 
than the one in Example 2. This implies that the loss for applying the fixed-codebook protocol can 
be significant. Thus, one's choice between these two protocols depends jointly on the cognitive radio's 
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channel and the interference generated on the primary. Indeed, there may exist strategies that can trade 
off the competing desires of optimality and robustness better than the ones proposed. These are discussed 
further in the next section. 

VI. Discussion 

In this paper, a novel model was proposed for a cognitive radio problem. The basic problem is that the 
cognitive radio must not disturb the primary user (i.e., the license holder). The specific aspect of our model 
is that the cognitive radio is ignorant of the channel characteristics according to which it interferes with 
the primary. To mitigate this uncertainty, the cognitive radio may eavesdrop on the primary system's ARQ 
feedback signal. We show how this can be exploited to design two adaptive cognitive radio strategies, 
each of which provides a fixed rate guarantee to the primary and variable rate guarantee to the cognitive 
radio that depends on its interference budget, the amount of interference it is allowed to generate on the 
primary user. The problem statement and results provide a starting point for new research directions and 
problems, some of which we briefly outline in the sequel. 

A. Gaussian Channels 

In this work, the cognitive radio's channel is a DMC with each symbol affecting the primary's erasure 
probability. An analogous model and result for the Gaussian setting would be desirable to gain further 
intuitions about the design of a cognitive radio system. For instance, if the primary employs a Gaussian 
codebook that assumes a certain level of interference, the cognitive radio may use the ARQs to choose 
the highest power codebook that maintains that level of interference on the primary. 

B. Primary with a Fixed Delay Constraint 

In our model, the cognitive radio must operate such that eventually, the primary attains its prespecfied 
target rate. A more restrictive setting would be to also enforce a delay constraint. That is, the cognitive 
radio must operate such as to not delay packets by more than a certain prespecified bound. Alternatively, 
this can be formulated as a "sliding window" rate constraint: over any window of a prespecified length, 
the primary must attain its prespecified rate. It would be interesting to understand by how much this 
lowers the "interference budget" of the cognitive radio, and thus, its capacity. 

C. Improved Strategies 

The cognitive radio's rate guarantees for the fixed-codebook protocol are somewhat pessimistic, and 
the rate guarantees for the codebook-adaptive protocol are restricted to the smaller class of time-invariant 
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interference parameters on the primary. The problem is that since the codebook-adaptive protocol only 
selects the codebook once, varying the interference conditions in the time can lead to suboptimal 
performance. For instance, the interference conditions in Phase I can be such that the primary selects a 
codebook with negligible rate in Phase II only to discover that there is no interference to the primary 
in Phase III. Thus, its performance can be significantly worse than the fixed-codebook protocol in the 
time-varying setting. 

An obvious alternative would be a strategy that periodically readapts the codebook, which, if done 
properly, may be able to provide stronger rate guarantees than those already provided in the time-varying 
setting. One may also wish to restrict the set of codebooks so that all codebooks have a rate above a 
certain threshold. Then, arguments similar to those used for the fixed-codebook protocol can provide rate 
guarantees for the time-varying case, and one can also exploit the advantage afforded by adapting one's 
codebook for the time-invariant case. 

D. Multiple Cognitive Radios 

In our model, there is only a single cognitive radio interfering with the primary. A more interesting 
situation will involve multiple cognitive radios all competing for the same interference budget. Clearly, 
this significantly changes the dynamics of the problem. Are there efficient strategies that give good rates 
for the cognitive radios while respecting the primary user? First of all, if all the cognitive radios have 
access to Ak with different delays, then the arguments in this work would need to be extended. The 
existence of multiple users also leads to the issue that any individual cognitive radio may not cause 
significant interference to the primary by itself, but the aggregate interference from all cognitive radios 
can still be quite large. Another issue to consider is how the cognitive radios might divide their rate in an 
equitable way based not only on their own channels but also on how much interference each generates 
on the primary. 

E. Noisy feedback 

In our model, the cognitive radio has a perfect observation of the ARQ signal of the primary, i.e., of 
the values of Ak- However, in practice there may be noise that corrupts the encoder's knowledge of A^. 
This may also play a crucial role for the case of multiple cognitive radios, in which the noise may be 
different for different terminals in the system. 
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Appendix I 
Proof of Theorem 1 

A. Primary Meets Rate Target 

Lemma 1: Let S k be defined as in (17). The sequence of random variables 

Mk = e AS' fc -fc(/ A ( eo )-Ai? p )-E 3 fc = iT- 3 (/Afe,i)-/A(fo) ^ (54) 

where / A (e) = log((l - e)e A + e) and r k = I{s lk/KniKn -K n -[k/K n }K nl >o}, forms a martingale. 
Proof: First observe that we can express M k in terms of the recurrence equation 



M k = M fe _ie AAfc_r ' s(/ ^ e ' s ' l)_/A(eo)) . (55) 

From this, we find that 

E[M k \M , M fc _x] = E[M k \S , S fc _i] (56) 
= M fc „ie- T " fc ( / ^ efc - 1 )- /A ( £o »E[e AAfc |S*o,...,^_i] (57) 
= M fc _x . (58) 

■ 

Lemma 2: Let S k be defined as in (17) and r be a positive integer and define the stopping time 

N = r ■ inf{i > : S ir - in - r > 0} . (59) 
Then for R p + 7 < 1 — eo and s < 0, 

P(iV > t\So = s) < ( (1 ~^~ 7Kl " e0) l 2r ~ S e-^-^IM . (60) 
V (-Rp + 7jeo / 
Proof: Consider the martingale (see Lemma 1) 

M k = e A5 fc-M/AM-Ai? p )-£ 3 fc =1 T 3 (/ A (e 3jl )-/ A (eo) ^ (-g^) 



where / A (e) = log((l-e)e A + e) and r k = \s lk/KniK7l -K n -[k/K n }K nl >o}- The optional stopping theorem 
[21, Thm. 4.7.4, p. 270] implies 

e Xs = E[M NAm \S = s] , (62) 
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where N Am denotes their minimum. We can substitute A = log ( 1 _jf P _' t 7 'j|i'L eo ) » which is nonnegative 

by assumption, into equation (62) to get that 

/ (Rp + 7 )gQ \ S ( (Rp + 7)gQ \ ^ E r (ATAm)D(l-i? p - 7 || eo ) ,o _ n f63) 

V(l-i?p-7)(l- e0 )J -U-^"7)(l-ed)>/ [ ' °" J ' (M) 

where the inequality follows since has bounded increments that are less than 1 almost surely and 

the stopping time that increases in multiples of r increments. By the monotone convergence [21, p. 15, 

Theorem 1.3.6], letting m — > oo gives 

E^i-^-tIMis = s ] < ( (l--fip-7)(l-eo) y r - a 
L V (i? P + 7)eo 7 

Using this moment inequality, we can apply a Chernoff bound to conclude 

P(iV > t\S = s) < ( (1 ~^~ + ^ ) ( 6 1 o ~ £0) ) e-* D ( 1 -^-^l^) . (65) 

■ 

Lemma 3: Given a strategy in which for all n, every code of blocklength n satisfies (18), if 7 > is 
chosen so that 7 < |, then the strategy is valid. That is, for all {e^i}?^, 



-(■K 2 , R ^ (66) 



where < #2,^,1/ < 00, K ltRptU/ < 00, and ^l^^e - ^ 2 ^" -> as £ -> 00. 

Proof: When > 0, the primary is meeting its rate target. Furthermore, if SiK n — K n — iK n ^/ > 0, 
then the primary will be guaranteed to meet its rate target over the next primary frame. Thus, it suffices 
to consider frames when SiK n — K n — iK n ^y < 0, which correspond directly to silent frames. To consider 
what happens in these settings, we define stopping times to threshold Se — £j. 

N 2k -i = mf{£ > N 2k -2 :S e -£j> 1}, (67) 

N 2k = mf{£ >N 2k _ 1 : S t -e-y<l}. (68) 

Negative deviations occur only when N 2k < £ < N 2k+ \, so 

nst < 0) 

< + ^ > h\N 2k < £ < N 2k+1 )F(N 2k < £ < N 2k+l ) (69) 
k 

< W{N 2k+1 - N 2k > £j\N 2k < £ < N 2k+1 )F(N 2k < £ < N 2k+1 ) (70) 
k 

< Y H^k+i ~ N 2k > £~f\N 2k < oo)P(iV2 fe < £ < N 2k+1 \N 2k+1 - N 2k > £j, N 2k < 00) (71) 

k 

< l - max F(N 2k+1 - N 2k > £^\N 2k < 00) , (72) 
2 i<2k<e 
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where (69) follows from the law of total probability; (70) since the bounded increments of Se imply that 
— Si + £j < N 2 k+i — N 2 k given N 2k < t < N 2k+ i, (71) from Bayes theorem and that probabilities are 
bounded from above by 1; and (72) since probabilities are bounded from above by 1 and for N 2k < t, 
it is necessary for 2k < t by the definition of N 2k , N 2k+ i. Since S n is a Markov chain, then for all k, 
Lemma 13 implies that we only need to consider 

V(N 2k+1 - N 2k > £ 7 \N 2k < oo) < max P(JV X > £ 7 |S = *) • (73) 

se[-i,o] 

(1 ~ Rp ~ 7)(1 ~ £ 0) \ 2 -£0(1-^-71160) 



where (74) follows from Lemma 2. Since the above holds for all k, substituting it into (72) gives 

nse < 0) 

- R p - 7 )(1 - 6 ) \ 2 e - <g( i-^- 7 || eo) 5 
(-Rp + 7)eo / 

( 1 _ i? 2 p _ 7 • (^(1 " ^ P " 7lko) - log - ID(1 -R p - 7 ||e )) (76) 

(- 1 _ i ^ p _ 7 • log - - 2/(1 - i?p - 7 ))£>(1 - R P ~ 7lko)) (77) 

^ 2 ' 6XP (" 1-4,-7 ' bg IpT^ " (£ " 2/(1 " ^ " 7))jD(1 " ^ " 7l|£0) ) ' (78) 
where the last line follows since 1 — eo > R p + v. For t > 2/(1 — R p — 7), we use the fact that 
D(l -R p - 7||e ) > (1 ~ Rp ~ 7 ~ £o)2 to get that 









~ 2 




_ I 




~ 2 


• exp 


_ £ 
~ 2 


■ exp 







nSe < 0) 

^ exp (-^ , og to _ , _ 2/(1 _ , p _ 7)) (1 -^-— (79) 



(80) 



where the last line follows by assumption. Note that this expression goes to as I — > 00. By letting 

#2,^,1/ = X' and K i,Rp,v,e to be ( 8 °) divided by e ~ eK2 - R e-" for £ > 2/(1 - Rp - 7), then we can write 

¥(S e < 0) < Ki, Rp ,^e-^ P - . (81) 

For I < 2/(1 — Rp — 7), we simply choose K\^ v e to make the probability upper bound 1. Then we 
can conclude our result by simply recalling the definition of Se in (17). ■ 
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B. Decoder Error 

Lemma 4: Define error events as follows: 

1) E\. 3 frame in which the decoder misidentifies it as active or silent. 

2) E-i'. 3 frame in which the decoder misidentifies the codeword. 

Then, for 5 > 0, n n = [n 1 ^ 16 ] , K n = [n 1 / 8 ] in the fixed-codebook protocol, as n — »■ oo, 

P(#i U £7 2 ) -»■ . (82) 
Proof: Note that we can bound the error as 

P(Ei U £ 2 ) < P(Si) + P(£k|£f) • (83) 

First consider the event of misidentifying whether transmission is taking place over a frame. There are 
■jl?- frames, and an error occurs if misidenification happens over any one of them. Thus, by taking a union 
bound over all frames and applying Lemma 15 for the error of the repetition code used to distinguish 
these frames, the error probability is bounded by 

71 

P(£i) < e~ K "' r , (84) 

K n 

where r > is independent of n. 

Finally, we can consider the error corresponding to misidentifying the codewords sent in each frame. 
By Lemma 14 and a union bound per frame, we also also have that 

¥(E 2 \E{) < e 8/e 2 +4(io E | y i)^ . (85) 

Combining (84) and (85) with (83), we complete the proof. ■ 

C. Rate Analysis 

The rate achievable by the cognitive radio is directly proportional to the fraction of frames in which 
it is active. Thus, we consider a bound on the number of frames the primary is guaranteed to be active. 
Lemma 5: For all S > and 7 < 5/2, there exists an no(S) such that for n > uq(5), 



1V , 1 - R p - 60 \ f { 1 + 5/2 _ x 1 

f t z^ Tk - — T^T J - exp [ ~ [ — 2 n 



k=l 

where is an indicator random variable to denote that k is in an active frame. That is, 

T k = ^{3£e1+ such that V e <k<V e +K n } , 

where V# is defined in (21). Furthermore, if K n = o(n), (86) goes to as n —>■ 00. 



(86) 
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Proof: Note that if the sequence e X) k = 1 for all k, x ^ x g, and the transmitter does not use 
the symbol x s during active frames, all channel uses in an active frame result in an erased packet for 
the primary. Thus, given eo, this case provides a convenient albeit conservative way to lower bound the 
fraction of active frames. We will assume it in the sequel. 

Recalling the definition of S n in (17) and the definition of in (18), S n can be no more than wy + 2K n 
during the course of a silent frame. Furthermore, it can only decrease from this during an active frame 
since e x ^ = 1, x / x ff- Thus, we are guaranteed almost surely that 

S n < n 7 + 2K n . (87) 

Recalling the definition of S n in (17), we can define 

n 

S n = S n + Y,n(l-to)-n(l-R p -e ) (88) 
k=i 

Thus, we can rewrite the problem as showing that 

P Tfc(l - £ o) - n(l -Rp- e ) < -nSj 

= P(S n -S n < -nS) (89) 

= nSn <S n - nS) (90) 

< n$n < -nS/2 + 2K n ) (91) 

is small, where the last line follows from (87) and the assumption that 7 < 5/2. 

It is straightforward to verify that for e x ^ = 1, x / x ff, S n is a martingale by checking that 
E[S n \§o, . . . , S n -i] = S n -\. Furthermore, it has bounded increments. Thus, a bounded martingale 
concentration inequality [22, p. 57, Corollary 2.4.7] implies that for large enough n, 

HS n < -nS/2 + 2K n ) < exp |-nD { ^ + ^ 2 ~ n~ x K n | . (92) 

The result follows immediately. ■ 
Lemma 6: Given 5 > 0, ecb{ei,fc}fcLi> consider the fixed-rate protocol with n n = o(K n ),K n = 
o(n),K n — > 00 as n —>■ 00, and 7 < 5/2. Then there exists a choice of 5 such that this strategy attains 
rates of at least 

1 - J ■ C* - 5 , (93) 



1 - eo 

with probability going to 1 as n — > 00, where C* = max p ^ I(X; Y) 
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Proof: By Lemma 5, we know that with probability going to 1 as n — > oo, at least 

'-^-^ - S) n (94) 
of the frames will be active frames. The rate for each active frame is 

Kn~ «n (c , _ ~ d) ^ (95) 



so we have that 



i - e y if, 



> ( 1 ^-^ IC'-J-C/Cn/gn + ^-C* . (96) 



For large enough n, by assumption K n /K n < S/C*, so that rates of at least 

1 — R p — eo 



, C* - 25 - 5 ■ C* (97) 
1 - eo / 

are achievable with probability going to 1. By choosing 5 = | min{<5, 5/C*}, we can conclude the result. 



Appendix II 
Proof of Theorem 2 

A. Decoder Error 

Lemma 7: Define error events as follows: 

1) E\\ 3 frame in which the decoder misidentifies whether a frame is active or silent. 

2) E2: the decoder misidentifies the selected codebook. 

3) E3: 3 an active frame in which decoder misidentifies the codeword. 

Then, for 5 > 0, C n = [n 1 ^ 32 ] , n n = [n 1 / 16 ] , K n = [n 1 / 8 ] in the codebook-adaptive protocol, as 

n — > 00, 

F(E 1 U E 2 U E 3 ) -► . (98) 
Proof: Note that we can bound the error as 

P(£i U E 2 U ^3) < + F{E 2 \E{) + P(B 3 |^, ^2) • (99) 

First consider the event of misidentifying whether transmission is taking place over a frame. There are 
frames, and an error occurs if misidenification happens over any one of them. Thus, by taking a union 
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bound over all frames and applying Lemma 15 for the error of the repetition code used to distinguish 
these frames, the error probability is bounded by 

71 

P(-Ei) < T^e" K "- r , (100) 

where r > is independent of n. Another source of error is misidentifying the codebook. For large 
enough n, (C n + 1)1*1 does not exceed C — 5, and we can apply Lemma 14 to get the error probability 

¥{E 2 \E{) < e ~ 8/^+4(1°"^^ . (101) 

Finally, we can consider the error corresponding to misidentifying the codewords sent in each frame. By 
Lemma 14 and a union bound per frame, we also also have that 

W>(E 3 \E%,El) <~^-e iTT^Tio^w? . (102) 

Combining (100), (101), and (102) with (99), we complete the proof. ■ 

B. Rate Analysis 

The rate loss argument is the most tedious because one must account for a variety of factors: the 
length of the first two phases of transmission, the gap between the rates of quantized set of codebooks 
and points on the RIB function, and the number of active frames in Phase III. We therefore subdivide 
the result into several lemmas. 

1) Phase I and II are short: Because the encoder does not send message information in Phase I and 
II, we want the length of these phases to be sublinear in n to guarantee negligible rate loss. 

Lemma 8: For all u > 0, let 7 < v/2, n n = [n 1 / 16 ],^ = L n ^ 8 J i n tne codebook-adaptive protocol. 
Furthermore, let T be the length of Phases I and II and E x = {T > n 1 / 4 }. Then 

P(.Ei) -► (103) 

as n — > 00. 

Proof: Consider the transition times from silent frames to active frames and vice versa. To do this, 
define the stopping times for k > 1, 

N 2k -i = K n ■ mf{i > K~ l N 2k _ 2 : S t . Kn - i ■ K n ■ 7 - K n > 0} , (104) 
N 2k = K n ■ inf{i > K~ l N 2k _i : S t . Kn - i ■ K n ■ 7 - K n < 0} , (105) 
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where No = 0. Phases I and II end after the first two active frames. We can get a bound on the start of 
the first active frame immediately from Lemma 2, which implies 

[1 - R p - 7)(1 - e ) \ 2Kn t£>(i_j^_ 7 || eo ) 



nm > t\s = o) < ^ — ^ — ^ — ^ e -^(i-K P -7ii6oj _ (106) 

V (^P + 7)eo / 

Thus, if N 2 > Ni + K n , then 

T = Ni + 2K n . (107) 

Together with (106), this implies 



P(T > n^\N 2 >N 1+ K n )< f (1 ~^ p ~ 7) [ 1 ~ e ° ) ^ ^ e -(n 1 ^-^)-D(i-fl P -Tl| e o) , | ()8 , 

V (^ P + 7)eo / 

The remaining case to consider is if N 2 = N\ + K n . If this happens, then Phases I and II end at 

T = N 3 + K n . (109) 

Then (109) implies 

P(T > n 1 / 4 ]^ = N ± + K n ) = F(N 3 > n 1 / 4 - K n \N 2 = N ± + K n ) (110) 

= F(N 3 -N 2 > n 1 / 4 - 2K n - = N 1 + K n ) (1 1 1) 

< F(N 3 -N 2 > n l ' A - 2K n - N x or N ± > n^ 4 /2\N 2 = N 1 + K n ) 

(112) 

< P(iVi > n 1 / 4 /2|iV 2 = N r + #„) 

+ P(iV3 - iV 2 > n 4 / 4 /2 - 2K n |iV 2 = JVx + tf n , < n x / 4 /2) , (113) 

where (110) follows from (109), (111) follows by our conditioning, (112) follows since we are increasing 
the possible events over which we are taking the probability, and (113) follows from P(A or B) = 
P{A) + P(A C )P{B\A C ). 

By Lemma 13 and Lemma 2, 

F(N 3 -N 2 > n 4 / 4 /2 - 2K n \N 2 = N r + K n , Ni < n 1/4 /2) 

(I- Rp- 7)(1 - 6q) \ 2K " ( n i/4/2-2X„)-D(l- J R p - 7 j|e ) 



< ^ f LlJ_ u/ e -^n-'7^-zA n ;-i/^i-«p-7||£ ; (114) 

(i?p + 7)e 
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By combining (106), and (113), and (114), 

P(r>n 1 / 4 |iY 2 = iV 1 +^ n ) 



(i? p + 7)e 



+ ( a-fp-^d-^y^^^d-^o) (115) 

V (i?p + 7)eo J 

The result follows immediately from (108) and (115). ■ 
2) Codebook Quantization: To account for the error in the interference estimates and that the encoder 
must inform the decoder of which rate it will be targetting, we only have a limited number of codebooks 
to choose from at the start of Phase II. Thus, in general there will be a gap between the rate of a selected 
codebook and an actual point on the RIB function. In this subsection, we ensure that this gap is small. 
We first provide guarantees on accurate interference estimates. 

Lemma 9: Let T>£ be the event that Phase I of the codebook-adaptive protocol terminates at frame t. 
Then 



(max 

X 



\e x -e x \ > S\V t ) < 2\X\e-^ 2 ' 2 , (116) 



where n = L^^pJ- 

Proof: Recall the definition of the estimates given in (29). Then Ve = {V\ = (£ — l)K n } is an 
equivalent expression for the event. By Hoeffding's inequality [22, p. 57, Corollary 2.4.7], we have for 
each x € X 

n\e x - e x \ > 5^ = (£- l)K n ) < 2e~^l 2 . (117) 

The result then follows from a union bound on P(max x \i x — e x \ > 5\Vf). ■ 
Lemma 10: For the selected codebook \ given in (31), define as the event where the following 
two conditions are met: 

Y j e xPx (x)<l-R p -2 1 (118) 

X 

\ x \ , 2 3i , 5 

6 ^ log c^\ + w\ log m r ly\ 

<R x -Rm(e,R P + 2 7 + 5)< (119) 
~ 2\X\ ° g 2\X\ 2 ■ \y\' 
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Then for all 1 > 5 > 0, C n > 4\X\, and as n — > oo, 

P(£ 2 |^)^0, (120) 

where E\ is defined in Lemma 8. 

Proof: Given we can guarantee by setting 5 = in Lemma 9 that 

^l^" e -l^4 ( 121 ) 

X 

with probability going to 1 as n — > oo. Furthermore, we know that by definition 

e> x (x) < 1 - R p - 2 7 - 6 (122) 

a; 

From (121) and (122), we have that 

^2 e *Px( x ) ^ ^2 ^Px( x ) + ( l23 ) 

a; x 

< 1 - # p - 2 7 . (124) 

It remains to verify the other condition. Note that for any p(x), there is a codebook in the set with input 
distribution type p x c n (x) such that ^ x \p(x) — p x c n (x)\ < y 1 . Then by the continuity of entropy [23, 
Lemma 2.7, p. 33], we know that 

R m (I R p + 2j + 5) + 6^ log 2 < R x < R m (i, R p + 2 7 + 5) , (125) 

where the inequality on the right follows from (122) and the definition of the i? IB function. Lemma 18 
and (121) imply that 

\R m (e, R p + 2 1 + S)- R m (i, R p + 2 7 + 5)\< log ^ — . (126) 

Combining (125) and (126) yield the result. ■ 
3) Always On: Our next lemma shows that all frames are active after time y/n with probability going 
to 1 as n — ► oo. 

Lemma 11: Let E\ and E2 be defined as in Lemmas 8 and 10, respectively. Define E% to be the event 
that for some j > y/n, the condition in (26) is met, resulting in a silent frame. Then for all v > 
and e Xj j = e x for x / x ff> the codebook-adaptive protocol with parameters (7, C n , n n , K n , 6) satisfying 

< 7 < v/2, C n = [n 1 / 32 ] , K n = [n 1 / 16 ] , K n = [n 1 / 8 ] , 1 > 6 > 0, has the property that as n -> 00, 

F(E 3 \E^,E^)^0. (127) 
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Proof: We start by defining 

Sj = Sj-i + Aj — K[Aj\Sq, . . . , Sj-i], Sq = 0, 



(128) 



and it is easy to verify that Sj is a bounded martingale. From the definition of in Lemma 10 and for 
v > 0, (118), (26), and (33) imply that for j satisfying j > V 2 + K n and lK n > j > (I - l)K n + K n 
for some integer t > 1 , 



ElAjlSo,...,^-!} > i? P + 2 7 



(129) 



Then for k > n 1 / 2 and under E\, 



^li^2 



\ i=l 

< P I fc- 1 ^ < R p + 7 + k~ x K n - k- x (i Kn {Rp + 2 7 ) 



< P k~ l S k < -7 + o(l) 



(130) 



(131) 



where n = (£; — n 1 / 4 ) • Kn ~ Rn ; (1) is notational convenience for lim n ^oo o(l) = 0, and (130) follows 
from (128), (129), and since for all i, A^ > almost surely. Since is a zero-mean bounded martingale, 
for k > n 1 / 2 and large enough n, we can apply a bounded martingale concentration inequality [22, p. 
57, Corollary 2.4.7] to yield 



(k l Y, A k<R P + i + K n 

<exp (-fc( 7 + o(l)) 2 /2) 



^11^2 



= exp (- \^\ ( 7 + o(l)) 2 /2) • exp {—(k - )( 7 + o(l)) 2 /2) 



(132) 
(133) 



From the above result and a union bound, 

¥(E 3 \Ef,E c 2 ) 



<exp(-r^l(7 + o(l)) 2 /2)- £ exp(-(A;-r^l)(7 + o(l)) 2 /2) 



< 



exp (- [^1 (7 + o(l)) 2 /2) • ex P (" m ^ + ( 1 ))V2) 



(134) 



(135) 



m=0 



However, the geometric series does not affect the error probability by more than a constant asymptotically, 
so taking the limit above completes the result. ■ 
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4 ) Overall Rate Loss: 

Lemma 12: For all v > 0, e x ^ = e x for x / x s and given any 5 > 0, consider the codebook-adaptive 
protocol with parameters (7, C n , n n , K n ) satisfying < 7 < min{j//2,i/2},C„ = l/^ 32 ] , = 
[n 1 / 16 ] , K n = [n 1 ^ 8 ]- Then there exists a choice of the parameter 6 £ (0, 1/8) so that with probability 
going to 1 as n — > 00, the cognitive radio achieves rates 

R>Rm{e,Rp)-S. (136) 
Proof: Let Ei,E2,E 3 be defined as in Lemmas 8, 10, and 11 respectively. From these results, we 
know that 

P(£i U £ 2 U E 3 ) -> (137) 

as n — > 00 and thus with high probability, 

1) Phases I and II are short, ending by n 1 / 4 (Lemma 8). 

2) The gap between the codebook's rate and the RIB function is small (Lemma 10). 

3) After time n 1 / 2 , all frames are active frames (Lemma 11). 

Furthermore, we know that by our repetition code, there is a loss of K n positions for our repetition code 
over a frame K n . Factoring in this source of rate loss along with the fact that we are in Phase III by 
time n 1 / 2 (Lemma 8) and always in an active frame (Lemma 11), the rate 

n — \fn K n — K r , 



n K n 



>( J Rx-^)-(^ 1/2 + ^)logl^l (138) 
is achievable for the cognitive radio with probability going to 1 as n — ► 00. Finally, we know that the 
gap between the codebook's rate and the RIB function is small (Lemma 10), so 

R x > R m (t, R, + 2 7 + i) + 6 H log ^ + ^ log (139) 

> R m (?, R, + 21) + 6^ log ^ + ^ log , (140) 

> Rm{g , ^ + 6 H log _1_ + log _i_ + ni log ^ , (141) 

where (140) follows by our assumption about 7 and (141) from Lemma 17 given our assumption about 
5. Combining (138) with (141), our assumptions about (C n , K n , K n ) imply that for large enough n, the 
rate 

«.<?, *.) - 1 - (i + J. >og 3» + log !*H2!1) <i42) 
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is achievable for the cognitive radio with probability going to 1 as n — > oo. One can now observe that the 
parenthetical term in (142) vanishes as 5 goes to 0, so choosing 5 G (0, 1/8) such that this parenthetical 
term is less than | completes the proof. ■ 

Appendix III 
Technical Lemmas 

This appendix contains a series of self-contained technical lemmas that are included here for com- 
pleteness. It is likely that many of them exist elsewhere in the literature, but we were unable to find the 
references. 

A. A Markov Property 

Lemma 13: Let Aj be i.i.d. Bernoulli-p random variables such that p G (0, 1). Define stopping times 

i 

N 2 k-i = r ■ mf{i > r^N 2k ^ 2 : 

3=1 

i 

N 2k =r- mf{i > r^iV^-i : A i 

3=1 

where Nq = 0. Then for all real r, q and integers r, and on the event {N 2 k < oo}, 

P(^2fc+i - N 2k > I ■ r\N 2k < oo) = P(ATi > I ■ r\r - {q + 1) ■ r < S < r) . (145) 
Proof: Define Si = Yl)=i Ai — i ■ q ■ r and note that it is Markov. 

P(iV 2fc+1 -iV 2fc >£-r|iV2 fe <oo) 

i-r- 1 

= nS^ 2k+1 <r\r-(q + l)-r< < r, N 2k < oo) J] P(% fc+m+1 < r\S^ +m < r, N 2k < oo) 

m=l 

(146) 

i-r-l 

= P(5i < r|r - (q + 1) • r < S < r, N 2k < oo) J] P(5 m+1 < r|5 m < r) (147) 

m=l 

= P(iVi > £■ r|r- (g+ 1) • r < S < r) (148) 

where (146) follows from the definition of the stopping times and the fact that Si is Markov, (147) from 
the strong Markov property [21, p. 285, Theorem 5.2.4], and (148) by the definition of the stopping time. 



i ■ 1 > t} , 
i ■ Q < t} , 



(143) 
(144) 
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B. Hypothesis Testing and Codebook Errors 

Lemma 14: Let C be a random codebook for the DMC Py\x(u\ x ) with 2 nR codewords where the 

codewords are generated independently according to the distribution 

t 

Px'(x n ) = Y[ Px (xi) . (149) 

i=i 

Let C = /(X; Y). If maximum likelihood decoding is used at the decoder, then for R < C, the error 
probability P error is bounded from above by 

F(W ^ W) < e "^/^ ( +4aog|yi)^ . (150) 
Proof: The result is based on an exercise in Gallager's book [24, p. 539, Problem 5.23], which in 
turn derives from a result in that text [24, p. 138, Theorem 5.6.2]. While there is an error in the derivation 
outlined in that exercise, a corrected proof is given in [25]. ■ 
Lemma 15: Let pi(y) and P2(y) be two probability distributions such that pi(y) / P2(y) for at least 
one y G y. Given t samples of one of these distributions, then there exists a (random) hypothesis test T 
such that 

P(T(Y l ) + i\Y l selected iid from Pi ) < e^' r (151) 

where r > is some constant. 

Proof: The following is simply a variation on Stein's lemma. We will construct a randomized 
hypothesis test that gives this performance. For each sample, independently choose with probability 
A G (0, 1) a symbol uniformly over the alphabet; with probability 1 — A, choose the sample. This 
generates a new sequence of independent random variables Y{ with distribution 

Pi(y) = (1 - A)pi(y) + A . (152) 

Note that our assumptions imply that p\(y) / p2(y) f° r at least one y G y, and thus 

£>(pi||P2)>0,Z)(p2||pi)>0. (153) 

Now observe that 

i og ^M<io g M, (154) 

pi{y) A 

log^<logf. (155) 

P2{y) A 
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Using this, we construct the random hypothesis test T as follows. Let 



log^ Pi(Yi) 



We now define our hypothesis test to be 



T(Y l ) = { 



(157) 
2 otherwise 



If Yli=i Zi >0, T maps to 1. Otherwise, it maps to 2. When distribution 2 is the true one, we express 

the error probability as 

t l 

Zi > 0|selected from p 2 ) = P(^ _1 ^ Z t + D{p 2 \\pi) > D (p 2 1 |pi) | selected from p 2 ) • (158) 
i=i i=i 
By Hoeffding's inequality [22, p. 57, Corollary 2.4.7], we can bound this probability by 

t _J P(P2llPl) V /2 

P^" 1 ^ + D(p2||pi) > £>(p 2 1 |pi) | selected from p 2 ) < e v Iog ^ / . (159) 

i=i 

By a similar argument, one can also show that 

-t( SiSiMflY / 2 
- u-mi/-^ - - «,,„•:/,.... - r - - = V ' 0g ~ J 

1=1 

2 



P(£ -1 ^Z - D(pi||p2) < - D (pi | |p 2 ) | selected from Pl ) < e v / . (160) 

Setting r = i (^min | ^{p , ^jjg } ) completes the proof. ■ 



C. Monotonicity, Concavity, and Continuity of a Cost-Constrained Capacity 

Lemma 16: Let < e x < 1 for all x and define eo = min^ e x , which is achieved uniquely by some 
x. Then 

C(e,A) = max I{X;Y) , (161) 

p(x): 
E.e.p(a;)<A 

is nondecreasing concave in A on the interval [eo, 1]. 

Proof: Since increasing A increases the set of channel input distributions over which to maximize, 
it is clear that C(e, A) is nondecreasing. As a convenient shorthand, let I p = I(X; Y) denote the mutual 
information with the input distribution p. Let p\ be the maximizing input distribution for C(e,\\) and 
P2 the maximizing input distribution for C(e, A 2 ), both of which are guaranteed to exist for Aj € [eo, 1], 
% e {1,2}. Then 

(1 - p)C(e, Ai) + P C(e, A 2 ) = (1 - p)I Pl + P I P2 (162) 

< / (i-p) Pl +pp 2 ; (163) 

< C(e, (1 - p)Ai - pA 2 ) , (164) 
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where (163) follows from the concavity of mutual information with respect to its input distribution and 
(164) by definition. Thus, the function is concave. ■ 
Lemma 17: Let < e x < 1 for all x and define eo = min^ea;, which is achieved uniquely by some 
x. Consider 

C*(e,A) = max I{X;Y) , (165) 

p(x): 
£„ e.p(s)<A 

where A € [e , 1] and C(e, A) = for A < e . Then for < A < \, 

< C(e, A + A) - C(e, A) < -6Alog . (166) 

Proof: The lower bound follows immediately from Lemma 16. For the upper bound, note that 
C(e, eo) = since the constraint can only be met by applying all the probability to a single choice of 
x. Let A > eo- Let pi(x) be the maximizing input distribution for C(e, A + A). If p\{x) is found in 
the set of valid input distributions for C(e, A), then C(e, A + A) = C(e, A). Otherwise, we can define 
P2(x) = e x p2(x) = A and observe that 

A < ^ e x pi(x) < A + A , (167) 

which implies that 

0<^e ;c (p 1 (x)-p 2 (x))<A (168) 

X 

0<J2(^-^)(P2(x)-pi(x))<A. (169) 

X 

Thus, 

^2 \P2(x) ~Pi(x)\ = ^2e x \p 2 (x) -pi(x)\ + ^(1 - e x )\p 2 {x) - pi(x)\ (170) 

XX X 

< 2A (171) 

Let I p = I(X; Y) when the input distribution for X is p. Then by the continuity of entropy [23, Lemma 
2.7, p. 33], 

C(e, A + A) = I Pl (172) 

= I P2 + (I Pi -I P2 ) (173) 

2A 

< C(e, A) — 3 ■ 2A log j^jy\ (174) 
This is still valid as an upper bound for A < eo because of the monotonicity of C(e, A) from Lemma 16. 
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Lemma 18: Let < e x < 1 for all x and define eo = min^ e x , which is achieved uniquely by some 
x, so eo < ei = max^ e x . Consider 

C(e,A) = max I(X;Y) , (175) 

p(x): 
E.e.p(x)<A 

where A € [eo, 1] and C(e, A) = for A < eo- Furthermore, 

^2\e x -e x \<A<^. (176) 

X 

Then 

2A 

|C(r, A) - C(e~ A) | < -6A log . (177) 

Proof: The constraint on in C(e, A) can be rewritten as 

J2 < A - £(e x - e x )p(x) . (178) 

x a; 

Since X)x(^e — e x )p(x) < J2x 1^ — Cx|» a tighter constraint on is J2 X e xP{x) < A — A, and since 
Szfe ~ e x)p(z) > - J2x l^x — e x |, a looser constraint on p(x) is J2 X e xP(x) < A + A. Thus 



Then one can write 



C(e, A — A) < C(e, A) < C(e, A + A) . (179) 
C(e, A) - C(e, A) 

= C(e, A) - C(e, A - A) + C(e, A - A) - C(e, A) (180) 

< C(e, A) — C(e, A — A) , (181) 



and similarly, 



C(e, A) - C(e, A) 

= C(e, A) - C(e, A + A) + C(e, A + A) - C(e, A) (182) 
> C(e, A) — C (e, A + A) , (183) 

Lemma 17 then implies that 

2A 

|C(e-,A) -C(e,X)\ < -6Alogp^-^ . (184) 
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